AITopics | explicit planning

Collaborating Authors

explicit planning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

486c825db2f776da72d0b7a791f45b8f-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-19-2026, 12:34:50 GMT

artificial intelligence, reviewer, reward trap, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Explicit Planning for Efficient Exploration in Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 08:26:53 GMT

Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(n^2 md) or O(n^2 m + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration.

efficient exploration, explicit planning, exploration, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Including such an analysis

Neural Information Processing SystemsOct-2-2025, 16:08:45 GMT

This is a clear example of exploration-then-exploitation behaviour with exactly one phase change in the process.

artificial intelligence, reviewer, reward trap, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

df3aebc649f9e3b674eeb790a4da224e-AuthorFeedback.pdf

Neural Information Processing SystemsAug-16-2025, 21:20:50 GMT

T able 1: Robustness to model mismatch. Top-1 accuracy of SIPS at the third time quartile (Q3), evaluated on data generated by humans, RL agents, and mismatched models. We ran SIPS assuming r =2, q =0.95, T =10, and a Manhattan ( h Matched parameters are starred (*). We thank the reviewers for engaging carefully with our paper, and for providing helpful and constructive feedback. We will expand on these experiments in the final paper with more domains and cross-method comparisons.

artificial intelligence, final paper, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Reviews: Explicit Planning for Efficient Exploration in Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 10:44:34 GMT

This paper introduces the interesting idea of demand matrices to more efficiently do pure exploration. Demand matrices simply specific the minimum number of times needed to visit every state-action pair. This is then treated as an additional part of the state in an augmented MDP, which can then be solved to derive the optimal exploration strategy to achieve the specified initial demand. While the idea is interesting and solid, there are downsides to the idea itself and some of the analysis in this paper that could be improved upon. There are no theoretical guarantees that using this algorithm with a learned model at the same time will work.

artificial intelligence, demand matrix, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Energy > Oil & Gas > Upstream (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Explicit Planning for Efficient Exploration in Reinforcement Learning

Neural Information Processing SystemsOct-9-2024, 23:09:12 GMT

Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(n 2 md) or O(n 2 m nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand.

efficient exploration, explicit planning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

486c825db2f776da72d0b7a791f45b8f-MetaReview.html

Neural Information Processing SystemsMar-9-2024, 07:05:20 GMT

Title:Explicit Planning for Efficient Exploration in Reinforcement Learning All the reviewers and I liked the paper.

efficient exploration, explicit planning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Explicit Planning Helps Language Models in Logical Reasoning

Zhao, Hongyu, Wang, Kangrui, Yu, Mo, Mei, Hongyuan

arXiv.org Artificial IntelligenceNov-7-2023

Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose LEAP, a novel system that uses language models to perform multi-step logical reasoning and incorporates explicit planning into the inference procedure. Explicit planning enables the system to make more informed reasoning decisions at each step by looking ahead into their future effects. Moreover, we propose a training strategy that safeguards the planning process from being led astray by spurious features. Our full system significantly outperforms other competing methods on multiple standard datasets. When using small T5 models as its core selection and deduction components, our system performs competitively compared to GPT-3 despite having only about 1B parameters (i.e., 175 times smaller than GPT-3). When using GPT-3.5, it significantly outperforms chain-of-thought prompting on the challenging PrOntoQA dataset. We have conducted extensive empirical studies to demonstrate that explicit planning plays a crucial role in the system's performance.

deduction, reasoning, reasoning path, (15 more...)

arXiv.org Artificial Intelligence

2303.15714

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)

Genre:

Workflow (0.66)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)
(4 more...)

Add feedback

Explicit Planning for Efficient Exploration in Reinforcement Learning

Zhang, Liangpeng, Tang, Ke, Yao, Xin

Neural Information Processing SystemsMar-18-2020, 23:32:59 GMT

Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(n 2 md) or O(n 2 m nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand.

efficient exploration, explicit planning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback